: Three Approaches to GO-Tagging Biomedical Abstracts
نویسندگان
چکیده
In this paper we explore three approaches to assigning Gene Ontology semantic classifications to abstracts from the PubMed database: lexical lookup, information retrieval and machine learning. To evaluate the approaches we use two “gold” standards derived from the yeast genome database (SGD). While evaluation provides insights into the three approaches, it also reveals the difficulties in constructing a suitable gold standard for this task.
منابع مشابه
Identifying Experimental Techniques in Biomedical Literature
Named entity recognition of gene names, protein names, cell-lines, and other biologically relevant concepts has received significant attention by the research community. In this work, we considered named entity recognition of experimental techniques in biomedical articles. In our system to mine gene and disease associations, each association is categorized by the techniques used to derive the a...
متن کاملTagging gene and protein names in biomedical text
MOTIVATION The MEDLINE database of biomedical abstracts contains scientific knowledge about thousands of interacting genes and proteins. Automated text processing can aid in the comprehension and synthesis of this valuable information. The fundamental task of identifying gene and protein names is a necessary first step towards making full use of the information encoded in biomedical text. This ...
متن کاملTagging gene and protein names in full text articles
Current information extraction efforts in the biomedical domain tend to focus on finding entities and facts in structured databases or MEDLINE abstracts. We apply a gene and protein name tagger trained on Medline abstracts (ABGene) to a randomly selected set of full text journal articles in the biomedical domain. We show the effect of adaptations made in response to the greater heterogeneity o...
متن کاملGene Ontology (GO) Annotation in Biomedical Literature
In this paper, we propose an approach for doing Gene Ontology (GO) annotation on biomedical texts. The GO is an effort to create a controlled terminology for labelling gene functions in a more precise. Our system is based on the application of Parametrized Finite-State Graphs (P-FSG) for GO tagging. This process was implemented to the annotation of genes related with Alzehimer disease. This pro...
متن کاملFunctional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature
Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to...
متن کامل